An Efficient Approach for Text Clustering Based on Frequent Itemsets
نویسنده
چکیده
In recent times, the vast amount of textual information available in electronic form is growing at staggering rate. This increasing number of textual data has led to the task of mining useful or interesting frequent itemsets (words/terms) from very large text databases and still it seems to be quite challenging. The use of such frequent itemsets for text clustering has received a great deal of attention in research community since the mined frequent itemsets reduce the dimensionality of the documents drastically. In the proposed research, we have devised an efficient approach for text clustering based on the frequent itemsets. A renowned method, called Apriori algorithm is used for mining the frequent itemsets. The mined frequent itemsets are then used for obtaining the partition, where the documents are initially clustered without overlapping. Furthermore, the resultant clusters are effectively obtained by grouping the documents within the partition by means of derived keywords. Finally, for experimentation, the Reuter-21578 dataset are used and thus the obtained outputs have ensured that the performance of the proposed approach has been improved effectively.
منابع مشابه
Performance Evaluation of an Efficient Frequent Item sets-Based Text Clustering Approach
The vast amount of textual information available in electronic form is growing at a staggering rate in recent times. The task of mining useful or interesting frequent itemsets (words/terms) from very large text databases that are formed as a result of the increasing number of textual data still seems to be a quite challenging task. A great deal of attention in research community has been receiv...
متن کاملClustering Zebrafish Genes Based on Frequent-Itemsets and Frequency Levels
This paper presents a new clustering technique which is extended from the technique of clustering based on frequent-itemsets. Clustering based on frequent-itemsets has been used only in the domain of text documents and it does not consider frequency levels, which are the different levels of frequency of items in a data set. Our approach considers frequency levels together with frequent-itemsets...
متن کاملCandidate Cluster Extraction for Hierarchical Document Clustering
Text Document are tremendously increasing in the internet, the hierarchical document clustering has proven to be useful in grouping similar document for large applications. Still most documents suffer from problems of high dimensionality, scalability, accuracy and meaningful cluster labels. In this paper an new approach fuzzy frequent itemsets based hierarchical clustering is proposed, in which...
متن کاملText clustering using frequent itemsets
Frequent itemset originates from association rule mining. Recently, it has been applied in text mining such as document categorization, clustering, etc. In this paper, we conduct a study on text clustering using frequent itemsets. The main contribution of this paper is three manifolds. First, we present a review on existing methods of document clustering using frequent patterns. Second, a new m...
متن کاملAn Approach for Finding Frequent Item Set Done By Comparison Based Technique
Frequent pattern mining has been a focused theme in data mining research for over a decade. Abundant literature has been dedicated to this research and tremendous progress has been made, ranging from efficient and scalable algorithms for frequent itemsets mining in transaction databases to numerous research frontiers, such as sequential pattern mining, structured pattern mining, correlation min...
متن کامل